Jupyter Notebook Tutorial

To Do

  • Python markdown?
  • Interactive color picker?
  • Even more pandas?

Getting set up

  • Install Anaconda

    • I'd recommend getting the latest version of Python (version 3.6 at time of writing).
    • Also use this to get all the pythons:
      # install everything with Python 2 and 3. 
      conda create -n py36 python=3.6 anaconda
      conda create -n py27 python=2.7 anaconda
      # register py27 kernel - no need for "source" on windows
      source activate py27
      ipython kernel install
      # same for py36, and install juptyerhub in the py36 env
      source activate py36
      ipython kernel install
      pip install jupyterhub
      

  • Install necessary packages with:

    • pip install insert_package_name_here
    • You might have to preface that with sudo if you're on a Mac.
    • Alternatively, use conda install insert_package_name_here if you run into issues with pip
    • conda install -c conda-forge insert_package_name_here is also an option for certain packages.

  • You're probably going to want the following packages (though some may already be installed via Anaconda):

    • bokeh
    • holoviews
    • jupyter
    • jupyter_contrib_nbextensions
      • Run the following command for this after install: jupyter contrib nbextension install --user
    • jupyterthemes
      • Use if you're not happy with the default aesthetics of the notebook
      • Run at terminal for (most of) my aesthetic setup: jt -t grade3 -fs 12 -tfs 12 -nfs 115 -cellw 88% -T
      • If you don't like it, you can always go back to the default: jt -r
    • matplotlib
    • nbopen
      • Used to associate .ipynb files with Jupyter in your file manager
        • Linux/BSD: python -m nbopen.install_xdg
        • Windows: python -m nbopen.install_win
        • Mac: Clone the repository and run ./osx-install.sh
    • numpy
    • pandas
    • pivottablejs
    • prettypandas
    • matlab_kernel and pymatbridge
      • For using MATLAB
      • If pymatbridge doesn't work, go to matlabroot\extern\engines\python and run python setup.py install
    • rpy2
      • For using R
      • More instructions in relevant section below
    • scipy
    • seaborn
    • statsmodels
    • wes
      • Optional package for Wes Anderson-style color palettes

  • Open Jupyter notebook from terminal or cmd:

    • jupyter lab or jupyter notebook
      • Make sure to cd into the directory you want to run it in (or at least a directory higher than the one you want; you can't go higher from within the notebook instance, nor can you go laterally!)
      • You can switch between views by navigating to http://localhost:8888/lab or http://localhost:8888/tree respectively.

  • Enable your favorite nbextensions (below I've listed mine).
    • Tree Filter
    • table_beautifier
    • Variable inspector
    • Codefolding
    • Chrome clipboard
    • Codefolding in editor
    • contrib_nbextensions_help_item
    • nbextensions dashboard tab
    • Collapsible Headings, with add a control, adjust size of toggle controls, gray bracketed ellipsis, command-mode, collapse with ToC2
    • Python Markdown
      • must be trusted notebook to use properly -- enable trust at top-right of notebook
    • Table of Contents (2), with auto-number, sidebar, widen display, display toc as navigation menu, move title and menu left instead of center, and collapse
      • can export notebook to HTML with table of contents with: jupyter nbconvert --to html_toc FILENAME.ipynb
      • if you get an error that says "No such module as 'pre_pymarkdown'", then you will need to do the following:
        • find "pre_pymarkdown.py" on your computer and add it to the PYTHONPATH environment variable
        • add the following to your "jupyter_nbconvert_config.py" file:
          c = get_config()
          c.Exporter.preprocessors = ['pre_pymarkdown.PyMarkdownPreprocessor']
          

Markdown Tutorial

Double-click on this cell to see how everything was written!

Headings are made with preceding "#" signs. <h1> is #, <h2> is ##, etc.

White space

Force new blank lines with <br> .

Emphasis

Italics are made by surrounding a word or phrase with asterisks, or with underscores, like so.

Bold words are made by surrounding a word or phrase with 2 asterisks on each end.

You can make a phrase both bold and italic by combining the above!

Unordered Lists

  • Dashes make bullets
    • And tabbing first makes a sub-bullet
      • You can also just use a single space instead of a tab character; just be consistent.

Ordered Lists

  1. You can make ordered lists with a number followed by a dot.
  2. Here's another point.

Blockquotes

Put a ">" before a line to turn it into a blockquote.

Code

Unhighlighted code goes between backticks: this is code

And you can define blocks of code by sandwiching them between 3 backticks on either end (you can even define syntax highlighting!)

x = [1, 2, 3]
for i in x:
    print(i)


Hyperlinks go in square brackets, with the link itself going in parentheses immediately after (no whitespace allowed between neighboring brackets)!

Images are set up just like hyperlinks, but with an exclamation point in front. The writing in square brackets serves as the alt-text for the image.

Yale Psychology Department

Embed HTML, including video

In [1]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/HW29067qVWk" frameborder="0" allowfullscreen></iframe>

Jupyter commands

Magic commands

See all commands.

In [2]:
lsmagic
Out[2]:
Available line magics:
%alias  %alias_magic  %autocall  %automagic  %autosave  %bookmark  %cd  %clear  %cls  %colors  %config  %connect_info  %copy  %ddir  %debug  %dhist  %dirs  %doctest_mode  %echo  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %macro  %magic  %matplotlib  %mkdir  %more  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %popd  %pprint  %precision  %profile  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %ren  %rep  %rerun  %reset  %reset_selective  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%cmd  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3  %%ruby  %%script  %%sh  %%svg  %%sx  %%system  %%time  %%timeit  %%writefile

Automagic is ON, % prefix IS NOT needed for line magics.

Terminal commands

And run terminal commands directly with "!"

In [3]:
!pip list
alabaster (0.7.10)
anaconda-client (1.6.5)
anaconda-navigator (1.6.8)
anaconda-project (0.8.0)
asn1crypto (0.22.0)
astroid (1.5.3)
astropy (2.0.2)
babel (2.5.0)
backports.shutil-get-terminal-size (1.0.0)
beautifulsoup4 (4.6.0)
bitarray (0.8.1)
bkcharts (0.2)
blaze (0.11.3)
bleach (2.0.0)
bokeh (0.12.13)
boto (2.48.0)
Bottleneck (1.2.1)
CacheControl (0.12.3)
certifi (2017.7.27.1)
cffi (1.10.0)
chardet (3.0.4)
click (6.7)
cloudpickle (0.4.0)
clyent (1.2.2)
colorama (0.3.9)
comtypes (1.1.2)
conda (4.4.7)
conda-build (3.0.22)
conda-verify (2.0.0)
contextlib2 (0.5.5)
cryptography (2.0.3)
cx-Freeze (5.0.2)
cx-Oracle (6.1)
cycler (0.10.0)
Cython (0.26.1)
cytoolz (0.8.2)
dask (0.15.2)
datashape (0.5.4)
decorator (4.1.2)
distlib (0.2.5)
distributed (1.18.3)
docutils (0.14)
entrypoints (0.2.3)
et-xmlfile (1.0.1)
fastcache (1.0.2)
filelock (2.0.12)
Flask (0.12.2)
Flask-Cors (3.0.3)
gevent (1.2.2)
glob2 (0.5)
greenlet (0.4.12)
h5py (2.7.0)
heapdict (1.0.0)
holoviews (1.9.2)
html5lib (0.999999999)
idna (2.6)
imageio (2.2.0)
imagesize (0.7.1)
ipykernel (4.6.1)
ipython (6.1.0)
ipython-genutils (0.2.0)
ipywidgets (7.0.1)
isort (4.2.15)
itsdangerous (0.24)
jdcal (1.3)
jedi (0.10.2)
Jinja2 (2.10)
jsonschema (2.6.0)
jupyter (1.0.0)
jupyter-client (5.1.0)
jupyter-console (5.2.0)
jupyter-contrib-core (0.3.3)
jupyter-contrib-nbextensions (0.3.1)
jupyter-core (4.3.0)
jupyter-highlight-selected-word (0.0.11)
jupyter-latex-envs (1.3.8.4)
jupyter-nbextensions-configurator (0.2.8)
jupyterlab (0.28.11)
jupyterlab-launcher (0.5.5)
jupyterthemes (0.18.2)
Kivy (1.10.1.dev0)
Kivy-Garden (0.1.4)
kivy.deps.glew (0.1.9)
kivy.deps.gstreamer (0.1.12)
kivy.deps.sdl2 (0.1.17)
lazy-object-proxy (1.3.1)
lesscpy (0.12.0)
llvmlite (0.20.0)
locket (0.2.0)
lockfile (0.12.2)
lxml (3.8.0)
MarkupSafe (1.0)
matlab-kernel (0.15.0)
matlabengineforpython (R2017b)
matplotlib (2.0.2)
mccabe (0.6.1)
menuinst (1.4.8)
metakernel (0.20.8)
mistune (0.7.4)
mpld3 (0.3)
mpmath (0.19)
msgpack-python (0.4.8)
multipledispatch (0.4.9)
navigator-updater (0.1.0)
nbconvert (5.3.1)
nbformat (4.4.0)
networkx (1.11)
nltk (3.2.4)
nose (1.3.7)
notebook (5.0.0)
numba (0.35.0+10.g143f70e)
numexpr (2.6.2)
numpy (1.14.0)
numpydoc (0.7.0)
odo (0.5.1)
olefile (0.44)
openpyxl (2.4.8)
packaging (16.8)
pandas (0.20.3)
pandocfilters (1.4.2)
param (1.5.1)
partd (0.3.8)
path.py (10.3.1)
pathlib2 (2.3.0)
patsy (0.4.1)
pep8 (1.7.0)
pexpect (4.2.1)
pickleshare (0.7.4)
Pillow (4.2.1)
pip (9.0.1)
pivottablejs (0.9.0)
pkginfo (1.4.1)
plotly (2.0.16)
ply (3.10)
prettypandas (0.0.3)
progress (1.3)
prompt-toolkit (1.0.15)
psutil (5.2.2)
ptyprocess (0.5.2)
py (1.4.34)
pycodestyle (2.3.1)
pycosat (0.6.3)
pycparser (2.18)
pycrypto (2.6.1)
pycurl (7.43.0)
pydotplus (2.0.2)
pyflakes (1.5.0)
Pygments (2.2.0)
pylint (1.7.2)
pymarkdown (0.1.4)
pymatbridge (0.5.2)
pyodbc (4.0.17)
pyOpenSSL (17.2.0)
pyparsing (2.2.0)
pypiwin32 (220)
PySocks (1.6.7)
pytest (3.2.1)
python-dateutil (2.6.1)
pytz (2017.2)
PyWavelets (0.5.2)
pywin32 (221)
PyYAML (3.12)
pyzmq (16.0.2)
QtAwesome (0.4.4)
qtconsole (4.3.1)
QtPy (1.3.1)
requests (2.18.4)
rope (0.10.5)
rpy2 (2.8.6)
ruamel-yaml (0.11.14)
scikit-image (0.13.0)
scikit-learn (0.19.0)
scipy (0.19.1)
seaborn (0.8)
setuptools (38.4.0)
simplegeneric (0.8.1)
singledispatch (3.4.0.3)
six (1.11.0)
snowballstemmer (1.2.1)
sortedcollections (0.5.3)
sortedcontainers (1.5.7)
Sphinx (1.6.3)
sphinxcontrib-websupport (1.0.1)
spyder (3.2.3)
SQLAlchemy (1.1.13)
statsmodels (0.8.0)
sympy (1.1.1)
tables (3.4.2)
tblib (1.3.2)
testpath (0.3.1)
toolz (0.8.2)
torch (0.2.1+a4fc05a)
torchvision (0.1.9)
tornado (4.5.3)
traitlets (4.3.2)
typing (3.6.2)
unicodecsv (0.14.1)
urllib3 (1.22)
wcwidth (0.1.7)
webencodings (0.5.1)
Werkzeug (0.12.2)
wes (0.1.5)
wheel (0.29.0)
widgetsnbextension (3.0.2)
win-inet-pton (1.0.1)
win-unicode-console (0.5)
wincertstore (0.2)
wrapt (1.10.11)
xlrd (1.1.0)
XlsxWriter (0.9.8)
xlwings (0.11.4)
xlwt (1.3.0)
zict (0.1.2)
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.

Helpful shortcuts

  • While coding, SHIFT+TAB will bring up help for your current function
  • CTRL+Enter executes the current cell, keeping your focus on it
  • CTRL+SHIFT+Enter executes the current cell, and moves you down to the next cell
  • ALT+Enter executes the current cell AND makes a new one below
  • ESC brings you to command mode, where you can do a number of things:
    • A makes a new cell above
    • B makes a new cell below
    • D D (that's D twice) deletes a cell
    • Y turns the cell into code
    • M turns the cell into Markdown
  • CTRL+SHIFT+F brings up the command palette, with all available commands
  • You can also edit such shortcuts from the "Help" menu at the top of the screen

Data analysis with pandas

Giant pandas tutorial and attendant notes.

Setup

Allow plots in the notebook itself, and enable some helpful functions.

In [4]:
%reset -f
%matplotlib inline
%config InlineBackend.figure_format = 'retina' # High-res graphs (rendered irrelevant by svg option below)
%config InlineBackend.print_figure_kwargs = {'bbox_inches':'tight'} # No extra white space
%config InlineBackend.figure_format = 'svg' # 'png' is default

import warnings
warnings.filterwarnings('ignore') # Because we are adults


Import example data.

In [5]:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

data = sns.load_dataset('tips')
data.head() # show first n entries (default is 5)
Out[5]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

Data exploration

Change default graph appearance to something you like. See here for full list of available built-in styles.

In [6]:
sns.set_style("ticks") # e.g., ggplot, whitegrid, etc.


Plot histograms of tips grouped by sex side by side. Make sure both have the same x and y limits.

In [7]:
data['tip'].hist(by=data['sex'], sharex=True, sharey=True)
sns.despine() # Remove top and right side of box

plt.show() # Somewhat redundant in this context, but suppresses annoying text output.

Plot overlaid histograms.

In [8]:
grouped_by_sex = data.groupby('sex')

# You can also add several arguments below like bins=20, or normed=True
figure, axes = grouped_by_sex['tip'].plot(kind='hist', normed=False, alpha=.5, legend=True) 

# Re-label legend entries, move legend to right-middle
axes.legend(['Men', 'Women'], loc=(0.75, 0.5)) 

sns.despine() 
plt.show()

Show summary stats for the sexes.

In [9]:
grouped_by_sex['tip'].describe()
Out[9]:
count mean std min 25% 50% 75% max
sex
Male 157.0 3.089618 1.489102 1.0 2.0 3.00 3.76 10.0
Female 87.0 2.833448 1.159495 1.0 2.0 2.75 3.50 6.5

ANOVA


Perform an ANOVA, using R-style syntax.

In [10]:
import statsmodels.api as sm
from statsmodels.formula.api import ols

model = 'tip ~ sex * smoker * total_bill'
lm = ols(model, data=data).fit()
table = sm.stats.anova_lm(lm, typ=2)

display(table)
sum_sq df F PR(>F)
sex 0.000077 1.0 0.000080 9.928825e-01
smoker 1.355188 1.0 1.400472 2.378352e-01
sex:smoker 1.181731 1.0 1.221220 2.702468e-01
total_bill 212.691658 1.0 219.798892 1.379847e-35
sex:total_bill 0.257702 1.0 0.266314 6.062986e-01
smoker:total_bill 19.286030 1.0 19.930485 1.244839e-05
sex:smoker:total_bill 0.047207 1.0 0.048784 8.253839e-01
Residual 228.368900 236.0 NaN NaN


Make the table prettier and more intelligible.

In [11]:
from prettypandas import PrettyPandas 

def color_significant_green(val, alpha=0.05):
    if val < alpha: color = 'green'
    else: color = 'black'
    return 'color: %s' % color

def bold_significant(val, alpha=0.05):
    if val < alpha: font_weight = 'bold'
    else: font_weight = 'normal'
    return 'font-weight: %s' % font_weight

t = PrettyPandas(table)
(
    t.applymap(color_significant_green, alpha=.05, subset=['PR(>F)']) # alpha is optional here, of course
    .applymap(bold_significant, alpha=.05, subset=['PR(>F)'])
    .format("{:.3f}", subset=['sum_sq', 'F', 'PR(>F)']) # show only 3 decimal places
)
Out[11]:
sum_sq df F PR(>F)
sex 0.000 1 0.000 0.993
smoker 1.355 1 1.400 0.238
sex:smoker 1.182 1 1.221 0.270
total_bill 212.692 1 219.799 0.000
sex:total_bill 0.258 1 0.266 0.606
smoker:total_bill 19.286 1 19.930 0.000
sex:smoker:total_bill 0.047 1 0.049 0.825
Residual 228.369 236 nan nan

T-tests

In [12]:
from numpy import sqrt
from scipy.stats import ttest_ind

def cohens_d(t, n):
    return 2*t / sqrt(n - 2)

# Set up empty results table
columns = ['n', 't', 'p', 'd']
index = []
results = pd.DataFrame(index=index, columns=columns)

# Get data for t-test
male_tips = data[data['sex']=='Male']['tip']
female_tips = data[data['sex']=='Female']['tip']

# Perform t-test and surrounding calculations
n = male_tips.count() + female_tips.count()
t, p = ttest_ind(male_tips, female_tips)
d = cohens_d(t, n)

# Add data to table
comparison = 'Male vs. Female'
results.loc[comparison] = [n, t, p, d]

# Output pretty table
r = PrettyPandas(results)
(
    r.applymap(color_significant_green, subset=['p'])
    .applymap(bold_significant, subset=['p'])
    .format("{:.3f}", subset=['t', 'p', 'd'])
)
Out[12]:
n t p d
Male vs. Female 244 1.388 0.166 0.178

Repeated measures ANOVA

Requires development version of statsmodels package, available here.

In [13]:
import pandas as pd
import numpy as np
import statsmodels
from statsmodels.stats.anova import AnovaRM
statsmodels.__version__
Out[13]:
'0.8.0.dev0+91ed779'

Create simulated reaction time data for 2 levels of an independent variable.

In [14]:
N = 20
P = [1,2]

values = [998,511]
 
sub_id = [i+1 for i in range(N)]*len(P)
mus = np.concatenate([np.repeat(value, N) for value in values]).tolist()
rt = np.random.normal(mus, scale=112.0, size=N*len(P)).tolist()
iv = np.concatenate([np.array([p]*N) for p in P]).tolist()

df = pd.DataFrame({'id': sub_id, 'rt': rt, 'iv':iv})

Do the repeated measures ANOVA.

In [15]:
aovrm = AnovaRM(df, depvar='rt', subject='id', within=['iv'])
fit = aovrm.fit()
fit.summary()
Out[15]:
F Value Num DF Den DF Pr > F
iv 220.4909 1.0000 19.0000 0.0000

Plots

Line graph

Plot simple line graph with sample data.

In [16]:
line_data = range(1,10)

plt.figure()
plt.title("Example Graph", size="xx-large") # can also feed font point size, like 36
plt.xlabel("X-Axis Label", size="x-large")
plt.ylabel("Y-Axis Label", size="x-large")
plt.xlim(0,10)
plt.ylim(0,10)
plt.plot(line_data, 'b*-', markersize=10, linewidth=3, label='Sample Data') # b*- means blue star marker with line
plt.tick_params(axis="both", which="major", labelsize=14)
plt.legend(loc=(0.25, 0.75), scatterpoints=1)
plt.show()

Violin plot and beeswarm plot

Plot violin plot with overlaid beeswarm plot.

In [17]:
fig, ax = plt.subplots()

# Output to the size of A4 paper
fig.set_size_inches(11.7, 8.27)

# Overlay a swarmplot on top of a violinplot
ax = sns.violinplot(x="day", y="total_bill", data=data, inner=None)
ax = sns.swarmplot(x="day", y="total_bill", data=data, color="white")

Bar Plots

In [18]:
def set_titles(thisPlot, titleList, fontSize):
    for ax, title in zip(thisPlot.axes.flat, titleList):
        ax.set_title(title, fontsize=fontSize)

        
def set_labels(thisPlot, xLabel, yLabel, fontSize):
    thisPlot.set_xlabels(xLabel, fontsize=fontSize)
    thisPlot.set_ylabels(yLabel, fontsize=fontSize)

    
def set_xtick_labels(thisPlot, tickList, fontSize):
    thisPlot.set_xticklabels(tickList, fontsize=fontSize)

    
def set_legend(thisPlot, legendEntries, fontSize):
    # find where last graph is so we can put the legend there
    maxIndex = max(thisPlot.axes.shape) - 1
    
    # format the legend, placing it outside the axes
    thisPlot.axes[0][maxIndex].legend(bbox_to_anchor=(1.05, 1), loc=2, 
                                      fontsize=fontSize, borderaxespad=0.)
    legend = thisPlot.axes[0][maxIndex].get_legend()
    labels = legend.get_texts()
    for i, thisLabel in enumerate(labels):
        labels[i].set_text(legendEntries[i])        


# Make plots -- many of these arguments are optional
barPlot = sns.factorplot(x="day", y="total_bill", hue="sex", 
                         col="time", kind="bar", data=data, 
                         size=5, aspect=1, legend=False)

beeswarmPlot = sns.factorplot(x="day", y="total_bill", hue="sex", 
                              col="time", kind="swarm", dodge=True,
                              data=data, size=5, aspect=1, legend=False)

# Format them nicely!
# Axis labels
xLabel = ""# "Day"
yLabel = "Total Bill"
set_labels(barPlot, xLabel, yLabel, 20)
set_labels(beeswarmPlot, xLabel, yLabel, 20)

# Titles
title_list = ["Lunch", "Dinner"]
titles = [x.title() for x in title_list] # ["Bimodal", "Normal", "Skewed"]
set_titles(barPlot, titles, 30)
set_titles(beeswarmPlot, titles, 30)

# X axis tick labels or category labels
x_tick_labels = ["Thursday", "Friday", "Saturday", "Sunday"]
set_xtick_labels(barPlot, x_tick_labels, 15)
set_xtick_labels(beeswarmPlot, x_tick_labels, 15)

# Change legends
legendEntries = ["Male", "Female"]
set_legend(barPlot, legendEntries, 15)
set_legend(beeswarmPlot, legendEntries, 15)

Interactive Plots

Bokeh

Made using bokeh. See here for a great tutorial, and here for the attendant notebook. Code below adapted from linked code to our current dataset.

In [25]:
from bokeh.plotting import figure, output_notebook, show

this_plot= figure(width=600, height=600)

this_plot.circle(x=data['total_bill'], y=data['tip'], size=10, alpha=0.7)
output_notebook() # to output inline 
show(this_plot)
Loading BokehJS ...

Make better, more interactive plot. Let's plot a scatterplot of tip amount vs. total bill, separately for men and women.

See here for more information on styling Bokeh plots.

In [26]:
from bokeh.plotting import figure, output_notebook, show, ColumnDataSource
import bokeh.models.tools as tools

# Get relevant subsets of data
male_data = data[data['sex'] == 'Male']
female_data = data[data['sex'] == 'Female']

# Convert to format bokeh understands
source_male = ColumnDataSource(male_data)
source_female = ColumnDataSource(female_data)

# Set up figure
this_plot = figure(width=600, height=600)

this_plot.circle(source=source_male, x='total_bill', y='tip', color='teal',
         size=10, alpha=0.7, legend='Men')

this_plot.circle(source=source_female, x='total_bill', y='tip', color='darkorange',
         size=10, alpha=0.7, legend='Women')

# Set axis labels
this_plot.xaxis.axis_label = "Total Bill"
this_plot.yaxis.axis_label = "Tip Amount"

# Show information when hovering the mouse over datapoints
this_plot.add_tools(tools.HoverTool(tooltips=[('Day', '@day')])) # use @ to choose feature from dataset

# Hide all circles of a given category when clicked in legend
this_plot.legend.click_policy = 'hide' 

output_notebook() 
show(this_plot)
Loading BokehJS ...
In [28]:
import holoviews as hv
hv.extension('bokeh', 'matplotlib')

ds = hv.Dataset(data, kdims=["sex", "smoker", "total_bill"],
                      vdims=["time", "size", "day", "tip"])
In [29]:
%%output backend='bokeh'
%%output size=200
%%opts Scatter [tools=['hover']] (size=8 alpha=0.5)

kdims=["tip"]
vdims=["total_bill", "day", "time", "size"] # include "smoker" if you don't want it as drop-down choice

# Scatter plot with hover tool that includes all the things
scatter = ds.to(hv.Scatter, kdims, vdims).overlay('sex')
scatter
Out[29]:

Pivot table plots

In [23]:
from pivottablejs import pivot_ui
pivot_ui(data)
Out[23]:

Interactive slider

In [1]:
import matplotlib.pyplot as plt
from ipywidgets import *
from numpy import pi, arange, sin

t = arange(0, 1.0, 0.01)


def pltsin(f):
    plt.plot(t, sin(2*pi*t*f))
    plt.show()
    
interact(pltsin, f=(1,10,0.1))
Out[1]:
<function __main__.pltsin>

Plotly

Plotly is another package for producing really nice and interactive graphs, but it requires signing up for an account to initialize it. After initialization you can use it online by default (which means all of your graphs get saved to the cloud for everyone to see forever) or you can use it offline (as demoed below). Examples taken or modified from here.

Setup and basic line graph

In [24]:
import plotly
# plotly.tools.set_credentials_file(username='XXX', api_key='XXX') # initialize with your credentials -- only need to do once ever.
from plotly.graph_objs import Scatter, Layout

plotly.offline.init_notebook_mode(connected=True)

plotly.offline.iplot({
    "data": [Scatter(x=[1, 2, 3, 4], y=[4, 3, 2, 1])],
    "layout": Layout(title="hello world")
})

Troubleshooting setup

When I first tried using plotly I sometimes got "IOPub data rate exceeded" errors. Here's how you fix that:

  • run jupyter notebook --generate-config to generate a clean configuration file with all parameters commented out
  • modify c.NotebookApp.iopub_data_rate_limit and c.NotebookApp.iopub_msg_rate_limit to be some absurdly large numbers

Tables

In [25]:
import plotly.offline as py
import plotly.figure_factory as ff

df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/school_earnings.csv")

table = ff.create_table(df)
py.iplot(table, filename='plotly\table1')

Bar graphs

In [26]:
import plotly.offline as py
from plotly.graph_objs import *
data = [Bar(x=df.School,
            y=df.Gap)]

py.iplot(data)
In [27]:
trace_women = Bar(x=df.School,
                  y=df.Women,
                  name='Women',
                  marker=dict(color='#ffcdd2'))

trace_men = Bar(x=df.School,
                y=df.Men,
                name='Men',
                marker=dict(color='#A2D5F2'))

trace_gap = Bar(x=df.School,
                y=df.Gap,
                name='Gap',
                marker=dict(color='#59606D'))

data = [trace_women, trace_men, trace_gap]
layout = Layout(title="Average Earnings for Graduates",
                xaxis=dict(title='School'),
                yaxis=dict(title='Salary (in thousands)'))
fig = Figure(data=data, layout=layout)

py.iplot(fig)

Interactive slider

In [28]:
data = [dict(
        visible = False,
        line=dict(color='00CED1', width=6),
        name = '𝜈 = '+str(step),
        x = np.arange(0,10,0.01),
        y = np.sin(step*np.arange(0,10,0.01))) for step in np.arange(0,5,0.1)]
data[10]['visible'] = True

steps = []
for i in range(len(data)):
    step = dict(
        method = 'restyle',
        args = ['visible', [False] * len(data)],
    )
    step['args'][1][i] = True # Toggle i'th trace to "visible"
    steps.append(step)

sliders = [dict(
    active = 10,
    currentvalue = {"prefix": "Frequency: "},
    pad = {"t": 50},
    steps = steps
)]

layout = dict(sliders=sliders)
fig = dict(data=data, layout=layout)

py.iplot(fig)

Interactive 3D Plots

In [29]:
s = np.linspace(0, 2 * np.pi, 240)
t = np.linspace(0, np.pi, 240)
tGrid, sGrid = np.meshgrid(s, t)

r = 2 + np.sin(7 * sGrid + 5 * tGrid)  # r = 2 + sin(7s+5t)
x = r * np.cos(sGrid) * np.sin(tGrid)  # x = r*cos(s)*sin(t)
y = r * np.sin(sGrid) * np.sin(tGrid)  # y = r*sin(s)*sin(t)
z = r * np.cos(tGrid)                  # z = r*cos(t)

surface = Surface(x=x, y=y, z=z)
data = Data([surface])

layout = Layout(
    title='Parametric Plot',
    scene=Scene(
        xaxis=XAxis(
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        yaxis=YAxis(
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        zaxis=ZAxis(
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        )
    )
)

fig = Figure(data=data, layout=layout)
py.iplot(fig)

Debugging in Jupyter Notebooks

Use set_trace() where you want the debugger to start.
'n' moves onto the next line
'c' continues execution of the script

In [ ]:
from IPython.core.debugger import set_trace

def increment_value(a):
    a += 1
    set_trace()
    print(a)

increment_value(3)

Run R code


Note that this requires running from a Python 3 instance of Jupyter (in my case, at least).

R for Jupyter installation instructions:

  • In R (not RStudio), run the following:
    install.packages('devtools')
    devtools::install_github('IRkernel/IRkernel')
    IRkernel::installspec()  # to register the kernel in the current R installation
    
  • make sure you have R added to your PATH (in my case, C:\Program Files\R\R-3.3.3\bin\x64)
    • Windows: Need R_HOME (same path as above) and R_USER (just your windows user name) added as separate environment vars
  • Install libraries like ggplot2 directly into R itself, not RStudio: install.packages('ggplot2', dependencies=TRUE)
  • pip install rpy2
    • Windows: get appropriate installation from here, and run pip install rpy2‑2.8.6‑cp36‑cp36m‑win_amd64.whl or whatever your .whl file is called from within the directory that has the file.
  • See here for further information if needed

Example Python to R pipeline

First, make some example data in Python.

In [34]:
import pandas as pd
df = pd.DataFrame({'Letter': ['a', 'a', 'a', 'b','b', 'b', 'c', 'c','c'],
                   'X': [4, 3, 5, 2, 1, 7, 7, 5, 9],
                   'Y': [0, 4, 3, 6, 7, 10, 11, 9, 13],
                   'Z': [1, 2, 3, 1, 2, 3, 1, 2, 3]})


Load extension allowing one to run R code from within a Python notebook.

In [35]:
%load_ext rpy2.ipython


Do stuff in R with cell or line magics. "-i" imports to R, "-o" outputs from R back to Python.

In [36]:
%%R 
install.packages("ggplot2", dep=TRUE)
install.packages("tidyr", dep=TRUE)
install.packages("dplyr", dep=TRUE)
In [37]:
%%R -i df
library("ggplot2")
ggplot(data = df) + geom_point(aes(x = X, y = Y, color = Letter, size = Z))

Run MATLAB code

MATLAB for Jupyter installation

pip install matlab_kernel
pip install pymatbridge

If you're getting a "zmq channel closed" error, open jupyter notebook from a different port when using MATLAB

jupyter notebook --port=8889

Example Python to MATLAB pipeline

Load MATLAB extension for running MATLAB code within a Python notebook.

In [38]:
%load_ext pymatbridge
Starting MATLAB on ZMQ socket tcp://127.0.0.1:52061
Send 'exit' command to kill the server
........MATLAB started and connected!


Do MATLAB things with line or cell magics.

In [39]:
%%matlab
a = linspace(0.01,6*pi,100);
plot(sin(a))
grid on
hold on
plot(cos(a),'r')

Exit MATLAB when done.

In [40]:
%unload_ext pymatbridge
MATLAB closed

Run Javascript code


I have no idea why you would ever need to do this, but here it is anyway. Note that Javascript executes as the notebook is opened, even if it's been exported as HTML!

In [41]:
%%javascript
console.log('hey!')